To address the problems of low recognition precision and difficult recognition of the existing one-stage anchor-free detectors in genetic object detection scenarios, a high-precision object detection algorithm based on improved variable focal network VarifocalNet (VFNet) was proposed. Firstly, the ResNet backbone network used for feature extraction in VFNet was replaced by the Recurrent Layer Aggregation Network (RLANet). The recurrent residual connection operation imported the features of the previous layer into the subsequent network layer to improve the representation ability of the features. Next, the original feature fusion network was substituted by the Feature Pyramid Network (FPN) with feature alignment convolution operation, thereby effectively utilizing the deformable convolution operation in the fusion process of the upper and lower layers of FPN to align the features and optimize the feature quality. Finally, the Focal-Global Distillation (FGD) algorithm was used to further improve the detection performance of small-scale algorithm. The evaluation experimental results on COCO (Common Objects in Context) 2017 dataset show that under the same training conditions,the improved algorithm adopting RLANet-50 as the backbone can achieve the mean Average Precision (mAP) of 45.9%, which is 4.3 percentage points higher than that of the VFNet algorithm, and the improved algorithm has the number of parameters of 36.67×10 6, which is only 4×10 6 higher than that of the VFNet algorithm. The improved VFNet algorithm only slightly increases the amount of parameters while improving the detection accuracy, indicating that the algorithm can meet the requirements of lightweight and high-precision of object detection.
In stock market, investors can predict the future stock return by capturing the potential trading patterns in historical data. The key issue for predicting stock return is how to find out the trading patterns accurately. However, it is generally difficult to capture them due to the influence of uncertain factors such as corporate performance, financial policies, and national economic growth. To solve this problem, a Multi-Scale Kernel Adaptive Filtering (MSKAF) method was proposed to capture the multi-scale trading patterns from past market data. In this method, in order to describe the multi-scale features of stocks, Stationary Wavelet Transform (SWT) was employed to obtain data components with different scales. The different trading patterns hidden in stock price fluctuations were contained in these data components. Then, the Kernel Adaptive Filtering (KAF) was used to capture the trading patterns with different scales to predict the future stock return. Experimental results show that compared with those of the prediction model based on Two-Stage KAF (TSKAF), the Mean Absolute Error (MAE) of the results generated by the proposed method is reduced by 10%, and the Sharpe Ratio (SR) of the results generated by the proposed method is increased by 8.79%, verifying that the proposed method achieves better stock return prediction performance.
In recent years, the Grid-based distributed Xin’anjiang hydrological Model (GXM) has played an important role in flood forecasting, but when simulating the flooding process, due to the vast amount of data and calculation of the model, the computing time of GXM increases exponentially with the increase of the model warm-up period, which seriously affects the computational efficiency of GXM. Therefore, a parallel computing algorithm of GXM based on grid flow direction division and dynamic priority Directed Acyclic Graph (DAG) scheduling was proposed. Firstly, the model parameters, model components, and model calculation process were analyzed. Secondly, a parallel algorithm of GXM based on grid flow direction division was proposed from the perspective of spatial parallelism to improve the computational efficiency of the model. Finally, a DAG task scheduling algorithm based on dynamic priority was proposed to reduce the occurrence of data skew in model calculation by constructing the DAG of grid computing nodes and dynamically updating the priorities of computing nodes to achieve task scheduling during GXM computation. Experimental results on Dali River basin of Shaanxi Province and Tunxi basin of Anhui Province show that compared with the traditional serial computing method, the maximum speedup ratio of the proposed algorithm reaches 4.03 and 4.11, respectively, the computing speed and resource utilization of GXM were effectively improved when the warm-up period is 30 days and the data resolution is 1 km.
In order to solve the problems, such as insufficient search ability and low search efficiency of Heap-Based optimizer (HBO) in solving complex problems, a Differential disturbed HBO (DDHBO) was proposed. Firstly, a random differential disturbance strategy was proposed to update the best individual’s position to solve the problem of low search efficiency caused by not updating of this individual by HBO. Secondly, a best worst differential disturbance strategy was used to update the worst individual’s position and strengthen its search ability. Thirdly, the ordinary individual’s position was updated by a multi-level differential disturbance strategy to strengthen information communication among individuals between multiple levels and improve the search ability. Finally, a dimension-based differential disturbance strategy was proposed for other individuals to improve the probability of obtaining effective solutions in initial stage of original updating model. Experimental results on a large number of complex functions from CEC2017 show that compared with HBO, DDHBO has better optimization performance on 96.67% functions and less average running time (3.445 0 s), and compared with other state-of-the-art algorithms, such as Worst opposition learning and Random-scaled differential mutation Biogeography-Based Optimization (WRBBO), Differential Evolution and Biogeography-Based Optimization (DEBBO), Hybrid Particle Swarm Optimization and Grey Wolf Optimizer (HGWOP), etc., DDHBO also has significant advantages.
Handwritten text recognition technology can transcribe handwritten documents into editable digital documents. However, due to the problems of different writing styles, ever-changing document structures and low accuracy of character segmentation recognition, handwritten English text recognition based on neural networks still faces many challenges. To solve the above problems, a handwritten English text recognition model based on Convolutional Neural Network (CNN) and Transformer was proposed. Firstly, CNN was used to extract features from the input image. Then, the features were input into the Transformer encoder to obtain the prediction of each frame of the feature sequence. Finally, the Connectionist Temporal Classification (CTC) decoder was used to obtain the final prediction result. A large number of experiments were conducted on the public Institut für Angewandte Mathematik (IAM) handwritten English word dataset. Experimental results show that this model obtains a Character Error Rate (CER) of 3.60% and a Word Error Rate (WER) of 12.70%, which verify the feasibility of the proposed model.
In current international society, as the international language, English characters appear in many public occasions, as well as the Chinese pinyin characters in Chinese environment. When these characters appear in the image, especially in the image with complex style, it is difficult to edit and modify them directly. In order to solve the problems, an image character editing method based on improved character generation network named Font Adaptive Neural network (FANnet) was proposed. Firstly, the salience detection algorithm based on Histogram Contrast (HC) was used to improve the Character Adaptive Detection (CAD) model to accurately extract the image characters selected by the user. Secondly, the binary image of the target character that was almost consistent with the font of the source character was generated by using FANnet. Then, the color of source characters were transferred to target characters effectively by the proposed Colors Distribute-based Local (CDL) transfer model based on color complexity discrimination. Finally, the target editable characters that were highly consistent with the font structure and color change of the source character were generated, so as to achieve the purpose of character editing. Experimental results show that, on MSRA-TD500, COCO-Text and ICDAR datasets, the average values of Structural SIMilarity(SSIM), Peak Signal-to-Noise Ratio (PSNR) and Normalized Root Mean Square Error (NRMSE) of the proposed method are 0.776 5, 18.321 1 dB and 0.435 8 respectively, which are increased by 18.59%,14.02% and decreased by 2.97% comparing with those of Scene Text Editor using Font Adaptive Neural Network(STEFANN) algorithm respectively, and increased by 30.24%,23.92% and decreased by 4.68% comparing with those of multi-modal few-shot font style transfer model named Multi-Content GAN(MC-GAN) algorithm(with 1 input character)respectively. For the image characters with complex font structure and color gradient distribution in real scene, the editing effect of the proposed method is also good. The proposed method can be applied to image reuse, image character computer automatic error correction and image text information restorage.
Most of the current Chinese questions and answers matching technologies require word segmentation first, and the word segmentation problem of Chinese medical text requires maintenance of medical dictionaries to reduce the impact of segmentation errors on subsequent tasks. However, maintaining dictionaries requires a lot of manpower and knowledge, making word segmentation problem always be a great challenge. At the same time, the existing Chinese medical questions and answers matching methods all model the questions and the answers separately, and do not consider the relationship between the keywords contained in the questions and the answers respectively. Therefore, an Attention mechanism based Stack Convolutional Neural Network (Att-StackCNN) model was proposed to solve the problem of Chinese medical questions and answers matching. Firstly, character embedding was used to encode the questions and answers to obtain the respective character embedding matrices. Then, the respective feature attention mapping matrices were obtained by constructing the attention matrix using the character embedding matrices of the questions and answers. After that, Stack Convolutional Neural Network (Stack-CNN) model was used to perform convolution operation to the above matrices at the same time to obtain the respective semantic representations of the questions and answers. Finally, the similarity was calculated, and the max-margin loss was calculated by using the similarity to update the network parameters. On the cMedQA dataset, the Top-1 accuracy of proposed model was about 1 percentage point higher than that of Stack-CNN model and about 0.5 percentage point higher than that of Multi-CNNs model. Experimental results show that Att-StackCNN model can improve the matching effect of Chinese medical questions and answers.
For the positioning task of mobile robots in indoor environment, the emerging auxiliary positioning technology based on Visual Inertial Odometry (VIO) is heavily limited by the light conditions and cannot works in the dark environment. And Ultra-Wide Band (UWB)-based positioning methods are easily affected by Non-Line Of Sight (NLOS) error. To solve the above problems, an indoor mobile robot positioning algorithm based on the combination of UWB and VIO was proposed. Firstly, S-MSCKF (Stereo-Multi-State Constraint Kalman Filter) algorithm/DS-TWR (Double Side-Two Way Ranging) algorithm and trilateral positioning method were used to obtain the position information of VIO output/positioning information resolved by UWB respectively. Then, the motion equation and observation equation of the position measurement system were established. Finally, the optimal position estimation of the robot was obtained by data fusion carried out using Error State-Extended Kalman Filter (ES-EKF) algorithm. The built mobile positioning platform was used to verify the combined positioning method in different indoor environments. Experimental results show that in the indoor environment with obstacles, the proposed algorithm can reduce the maximum error of overall positioning by about 4.4% and the mean square error of overall positioning by about 6.3% compared with the positioning method only using UWB, and reduce the maximum error of overall positioning by about 31.5% and the mean square error of overall positioning by about 60.3% compared with the positioning method using VIO. It can be seen that the proposed algorithm can provide real-time, accurate and robust positioning results for mobile robots in indoor environment.
The mixed noise formed by a large number of spikes, speckles and multi-directional stripe errors in Shuttle Radar Terrain Mission (SRTM) will cause serious interference to the subsequent applications. In order to solve the problem, a Low-Rank Group Sparsity_Total Variation (LRGS_TV) algorithm was proposed. Firstly, the uniqueness of the data in the local range low-rank direction was used to regularize the global multi-directional stripe error structure, and the variational idea was used to perform unidirectional constraints. Secondly, the non-local self-similarity of the weighted kernel norm was used to eliminate the random noise, and the Total Variation (TV) regularity was combined to constrain the data gradient, so as to reduce the difference of local range changes. Finally, the low-rank group sparse model was solved by the alternating direction multiplier optimization to ensure the convergence of model. Quantitative evaluation shows that, compared with four algorithms such as TV, Unidirectional Total Variation (UTV), Low-Rank-based Single-Image Decomposition (LRSID) and Low-Rank Group Sparsity (LRGS) model, the proposed LRGS_TV has the Peak Signal-to-Noise Ratio (PSNR) of 38.53 dB and the Structural SIMilarity (SSIM) of 0.97, which are both better than the comparison algorithms. At the same time, the slope and aspect results show that after LRGS_TV processing, the subsequent applications of the data can be significantly improved. The experimental results show that, the proposed LRGS_TV can repair the original data better while ensuring that the terrain contour features are basically unchanged, and can provide important support to the reliability improvement and subsequent applications of SRTM.
For the localization and static semantic mapping problems in dynamic scenes, a Simultaneous Localization And Mapping (SLAM) algorithm in dynamic scenes based on semantic and optical flow constraints was proposed to reduce the impact of moving objects on localization and mapping. Firstly, for each frame of the input, the masks of the objects in the frame were obtained by semantic segmentation, then the feature points that do not meet the epipolar constraint were filtered out by the geometric method. Secondly, the dynamic probability of each object was calculated by combining the object masks with the optical flow, the feature points were filtered by the dynamic probabilities to obtain the static feature points, and the static feature points were used for the subsequent camera pose estimation. Then, the static point cloud was created based on RGB-D images and object dynamic probabilities, and the semantic octree map was built by combining the semantic segmentation. Finally, the sparse semantic map was created based on the static point cloud and the semantic segmentation. Test results on the public TUM dataset show that, in highly dynamic scenes, the proposed algorithm improves the performance on both the absolute trajectory error and relative pose error by more than 95% compared with ORB-SLAM2, and reduces the absolute trajectory error by 41% and 11% compared with DS-SLAM and DynaSLAM respectively, which verifies that the proposed algorithm has better localization accuracy and robustness in highly dynamic scenes. The experimental results of mapping show that the proposed algorithm creates a static semantic map, and the storage space requirement of the sparse semantic map is reduced by 99% compared to that of the point cloud map.